Goto

Collaborating Authors

 multi-robot transfer learning


Hardware Conditioned Policies for Multi-Robot Transfer Learning

Neural Information Processing Systems

Deep reinforcement learning could be used to learn dexterous robotic policies but it is challenging to transfer them to new robots with vastly different hardware properties. It is also prohibitively expensive to learn a new policy from scratch for each robot hardware due to the high sample complexity of modern state-of-the-art algorithms. We propose a novel approach called Hardware Conditioned Policies where we train a universal policy conditioned on a vector representation of robot hardware. We considered robots in simulation with varied dynamics, kinematic structure, kinematic lengths and degrees-of-freedom. First, we use the kinematic structure directly as the hardware encoding and show great zero-shot transfer to completely novel robots not seen during training. For robots with lower zero-shot success rate, we also demonstrate that fine-tuning the policy network is significantly more sample-efficient than training a model from scratch. In tasks where knowing the agent dynamics is important for success, we learn an embedding for robot hardware and show that policies conditioned on the encoding of hardware tend to generalize and transfer well. Videos of experiments are available at: https://sites.google.com/view/robot-transfer-hcp.


Reviews: Hardware Conditioned Policies for Multi-Robot Transfer Learning

Neural Information Processing Systems

Disclaimer: my background is in control theory and only recently I have invested most of time in reading and doing research in the area of machine learning and reinforcement learning with specific focus on robotics and control. I went through the submitted paper carefully, including the supplementary material. Therefore I am quite confident with my assessment, especially since the problem that the addressed problem is well inside my core expertise (adaptive control). As I previously said, I am very confident with the problem, less confident with the theoretical framework (reinforcement learning) used to solve it. The math presented in the paper is relatively shallow and carefully checked.


Hardware Conditioned Policies for Multi-Robot Transfer Learning

Chen, Tao, Murali, Adithyavairavan, Gupta, Abhinav

Neural Information Processing Systems

Deep reinforcement learning could be used to learn dexterous robotic policies but it is challenging to transfer them to new robots with vastly different hardware properties. It is also prohibitively expensive to learn a new policy from scratch for each robot hardware due to the high sample complexity of modern state-of-the-art algorithms. We propose a novel approach called Hardware Conditioned Policies where we train a universal policy conditioned on a vector representation of robot hardware. We considered robots in simulation with varied dynamics, kinematic structure, kinematic lengths and degrees-of-freedom. First, we use the kinematic structure directly as the hardware encoding and show great zero-shot transfer to completely novel robots not seen during training.